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Minimum-variance estimators for the parameter f n \ that quantifies local-model non-Gaussianity 
can be constructed from the cosmic microwave background (CMB) bispectrum (three-point function) 
and also from the trispectrum (four-point function). Some have suggested that a comparison between 
the estimates for the values of / n i from the bispectrum and trispectrum allow a consistency test for 
the model. But others argue that the saturation of the Cramer-Rao bound by the bispectrum 
estimator implies that no further information on / n i can be obtained from the trispectrum. Here we 
elaborate the nature of the correlation between the bispectrum and trispectrum estimators for / n i. 
We show that the two estimators become statistically independent in the limit of large number of 
CMB pixels and thus that the trispectrum estimator does indeed provide additional information on 
/ni beyond that obtained from the bispectrum. We explain how this conclusion is consistent with 
the Cramer- Rao bound. Our discussion of the Cramer-Rao bound may be of interest to those doing 
Fisher-matrix parameter-estimation forecasts or data analysis in other areas of physics as well. 

PACS numbers: 



I. INTRODUCTION 

Observations of the cosmic microwave background 
(CMB) have confirmed a now 'standard' cosmological 
model [U . A key aspect of this model is that primordial 
fluctuations are a realization of a Gaussian random field. 
This implies that CMB fluctuations are completely char- 
acterized by their two-point correlation function C (9) in 
real space, or equivalently, the power spectrum Cg in har- 
monic space. All higher-order iV-point correlation func- 
tions with even N can be written in terms of the two- 
point function, and all iV-point correlation functions with 
odd N are zero. 

But while the simplest single-field slow-roll (SFSR) 
inflationary models assumed in the standard cosmo- 
logical model predict departures from Gaussianity to 
be undetectably small [2|, several beyond-SFSR mod- 
els predict departures from Gaussianity to be larger 
[H, and possibly detectable with current or forthcom- 
ing CMB experiments. While the range of predictions 
for non-Gaussianity is large, the local model for non- 
Gaussianity [|[ — that which appears in arguably the sim- 
plest beyond-SFSR models — has become the canonical 
model for most non-Gaussianity searches. The non- 
Gaussianity is parametrized in these models by a non- 
Gaussian amplitude f n \ to be defined more precisely be- 
low. 

Most efforts to measure f n \ have relied on an estima- 
tor constructed from the CMB bispectrum, the three- 
point correlation function in harmonic space. However, 
the local model also predicts a non-zero trispectrum (the 



harmonic-space four-point function) 0-0: and efforts 
have recentl y b een mounted to determine f n \ from the 
trispectrum [T(|. It has been suggested, moreover, that a 
comparison of the values of f n \ obtained from the bispec- 
trum and trispectrum can be used as a consistency test 
for the local model @, QjJ [Tl|. 

However, it can be shown that the bispectrum estima- 
tor for f n \ saturates the Cramer-Rao bound, and it has 
been argued that this implies that no new information 
on the value of f n \, beyond that obtained from the bis- 
pectrum, can be obtained from the trispectrum [l2|, [l3| . 
Ref. [IH further outlines the nature of the correlation 
between the bispectrum and trispectrum f a \ estimators 
implied by this conclusion. 

Here we show that the trispectrum does provide addi- 
tional information on / n j; i.e., it is not redundant with 
that from the bispectrum. We show that there is indeed 
a correlation between the bispectrum and trispectrum 
/ n i estimators, elaborating the arguments of Ref. [HI]. 
However, we show with analytic estimates and numerical 
calculations that this correlation becomes weak in the 
high-statistics limit. We explain, with a simple exam- 
ple, how additional information on f a \ can be provided 
by the trispectrum given that the bispectrum estimator 
for / n i saturates the Cramer-Rao bound. Put simply, the 
Cramer-Rao inequality bounds the variance with which 
a distribution can be measured, but there may be addi- 
tional information in a distribution, about a theory or 
its parameters, beyond the distribution variance. The 
discussion of the Cramer-Rao bound and the examples 
we work out in Section |TT] may be of interest to a much 
broader audience of readers than just those interested in 
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CMB non-Gaussianity. 

The outline of this paper is as follows: We begin in 
Section |n] with our discussion of the Cramer- Rao bound. 
The aim of the rest of the paper is to illustrate explic- 
itly the nature of the correlation between the bispectrum 
estimator for f n \ and the trispectrum estimator for f n 2 
and to show that the correlation becomes small in the 
high-statistics limit. In Section [IIII we introduce our con- 
ventions for the bispectrum and trispectrum. In Sec- 
tion IIVI we derive the minimum- variance estimators for 
/ n i from the bispectrum and trispectrum and evaluate 
the noises in each. We also write down approximations 
for the estimators and noises valid for the local model. 
In Section [V] we explain the nature of the correlation 
between the bispectrum and trispectrum estimators for 
/ni- We then show that this correlation becomes weak 
(scaling with (InTVpix)" 1 ) as the number N p [ x of pixels 
becomes large. We conclude in Section [VTl Appendix A 
details the correspondence between continuum and dis- 
crete Fourier conventions for power spectra, bispectra, 
and trispectra, and Appendix B provides describes the 
numerical evaluation of the correlation. 



II. THE CRAMER- RAO BOUND 

In the Sections below we will demonstrate that the esti- 
mators for / n i and f n 2 becomes statistically independent 
with sufficiently good statistics. However, the bispec- 
trum estimator for / n i saturates the Cramer-Rao bound, 
and it has been argued that this saturation implies that 
no further information about f n \, beyond that obtained 
from the bispectrum, can be obtained from the trispec- 
trum [lj, EH]. Here we explain that the Cramer- Rao 
inequality bounds only the variance with which f n \ can 
be measured; additional information, beyond the vari- 
ance, can be obtained from measurement of f n 2 from 
the trispectrum. 

To illustrate, consider, following Ref. the analo- 
gous problem of determining f n \ and f n 2 from a one- 
dimensional version of the local model. Suppose we have 
a random variable X written in terms of a Gaussian ran- 
dom variable x of zero mean ((x) = 0) and unit variance 



((x 2 ) = 1) as X = x + e(x 2 — 1). Here, e parametrizes 
the departure from the null hypothesis e = 0. The PDF 
for X, for a given e, is 



P(X\e) 
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where 



1 r 

X± = 2e 



± v / l + 4e(A + e) -1 



(1) 



(2) 



The logarithm of the PDF can then be Taylor expanded 
about e = as 



lnP(X|e) 



X 2 
2 



+ eh(X)--I 2 (X)+0(e 3 ), (3) 



where h(X) = X 3 - 3X, and I 2 (X) = hX A + 5 - UX 2 . 
It will be useful below to note that the expectation values 
of these quantities in the weakly non-Gaussian limit are 
(7i) = 6e + C(e 3 ) and (J a ) = 6 + 272 e 2 + C(e 4 ). 

Now suppose we have a realization consisting of N data 
points Xi, each drawn independently from the distribu- 
tion in Eq. (JT|), and let's arrange these data points into a 
vector X. The PDF for this realization, for a given e, is 



lnP(X|e) = J2 



X} 



+ eh(X l ) 



-J 2 (A 4 )+0(e 3 ) 



(4) 



The Cramer-Rao inequality states that the smallest 
variance Var(e) = (e" 2 ) — (e) 2 to an estimator ?is 

1 



Var(e) > 



F 



(5) 



where 



F = 



91nP(X|e) 



de 

91nP(X|e) 
de 



P(X|e)dX 



(G) 



is the Fisher information. Here, the angle brackets denote 
an expectation value with respect to the null-hypothesis 
(e = 0) PDF. Applying Eq. © to Eq. ©, we find 



F = J2(lh(X t )} 2 )=6N, 



from which we infer 



Var(e) > 



6N 



(7) 



(8) 



This model predicts a skewness (Ii) — (A 3 — 3X) = 
6e, and so we can construct an estimator for e from the 
measured skewness as follows: 



3Xi). 



(9) 



The variance to this estimator is Var(e" s ) = (67V) 1 , and 
so this estimator saturates the Cramcr-Rao bound. 

In retrospect, this saturation should come as no sur- 
prise. According to Eqs. ((4]) and ([6]), the Fisher 
information — and thus the minimum variance with which 
e can be measured — is determined entirely by the term in 
lnP(X|e) linear in e which, in this case, is precisely the 
skewness. Thus, the terms in lnP(X|e) that are higher 
order in e contribute nothing to the Fisher information. 
And since the term linear in e multiplies the skewness, e" s 
saturates the Cramer-Rao bound. 
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FIG. 1: Here we plot two probability distribution functions 
that share the same skewness but with two different values 
for the kurtosis. 



But this does not mean that there is no information 
about e from these higher-order terms. Consider, for ex- 
ample, a more general PDF, 



lnP Q (X|e,e 2 ) = - 



2 



eh(X) 



(10) 



parametrized by e 2 , in addition to the parameter e. This 
PDF differs from the PDF in Eq. Q in the coefficient 
of l2{X). In the weakly non- Gaussian limit, the skew- 
ness of this PDF is (I\(X)) — 6e, and its "kurtosis" is 
(I 2 (X)) = 6 + 846e 2 - 574e 2 + 18(e 2 - e 2 ). 1 If we fix 
e, we then have a family of PDFs, parametrized by ei, 
that all have the same skewness but with different values 
of the kurtosis. Fig. [T] shows two PDFs that have the 
same skewness but different kurtoses. These are clearly 
two very different distributions; qualitatively, the large- 
X tails are suppressed as e\ is increased. 

The estimator in Eq. ([§]) once again gives us the op- 
timal estimator for e in this new PDF, but we can now 
also measure from the data the kurtosis, the expecta- 
tion value of ^(X), which provides an estimator for 
846 e 2 - 574 e 2 + 18(e 2 - t\). This can then be used in 



1 In this paper we use the term "kurtosis" to denote the expecta- 
tion value of l2(X). This is qualitatively similar to, but slightly 
different, than the usual kurtosis, which is usually defined to be 
the expectation value of X 4 — 6X 2 + 3. 



combination with the skewness estimator for e to obtain 
an estimator for e 2 . According to the Cramer- Rao in- 
equality, the smallest variance to e\ that can be obtained 
is 



Var(e 2 ) 



<91nP(X|e,e 2 ) 



xP(X\e,el)dX 



1 



278 7V 



(11) 



Note that we cannot apply the Cramer-Rao bound to the 
parameter e 1; rather than e 2 , as <9P(X|e, e 2 )/<9ei is zero 
under the null hypothesis t\ — 0, thus violating one of 
the conditions for the Cramer-Rao inequality to apply. 
Since e 2 , not e±, is determined by the data, the distri- 
bution function for e 2 (not ei) will approach a Gaussian 
distribution in the large- TV limit. 

The covariance between e and e 2 is zero, as the for- 
mer is odd in X and the latter even. Still, this does 
not necessarily imply that the two are statistically in- 
dependent, as there is still a covariance between e 2 
and e 2 . However, this becomes small as N becomes 
large. The correlation coefficient in this example is 
r = Cov(e 2 ,e 2 )/v/Var(e 2 )Var(e 2 ) ~ 6 7V -1 / 2 . Thus, for 
large N, e and e\ are two statistically independent quan- 
tities that can be obtained from the data and then com- 
pared with the local- model prediction that e 2 = e 2 . In 
brief, the skewness and kurtosis are two different quanti- 
ties that can be obtained from a measured distribution. 
In the limit of large N, no measurement of the skewness, 
no matter how precise, can tell us anything about the 
kurtosis, and vice versa. 

In this example, a one-sigma excursion in e from a mea- 
surement with TV data points is Var 1 ^ 2 (e) = (67V) -1 / 2 , 
and this is smaller than Var 1/4 (e 2 ) = (278 7V)- 1 / 4 , the 
square root of the one-sigma excursion in e 2 , for any 
N > few. Thus, the skewness will provide better sensi- 
tivity if we are simply trying to detect a departure from 
the null hypothesis e = 0; measurement of e 2 will not add 
much in this case. Still, if e is measured with high statis- 
tical significance from the skewness, then measurement 
of ef can, with sufficient statistics, provide a statistically 
independent determination of e 2 and/or an independent 
test of the theory. 

Now consider another PDF. 



lnP smaU (X|e) = + 10- 2 €h(X) 



-I 2 (X)+0(e 3 ), 



(12) 



that differs from the local-model PDF in the suppres- 
sion we have inserted for the term linear in e, which thus 
suppresses the skewness. Application of the Cramer-Rao 
inequality in this case tells us that the smallest value 
of e that can be distinguished from the null hypothesis 
(e = 0) is 10 2 /V6-/V, and we know from the discussion 
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above that this variance is obtained via measurement of 
the skewness. However, e 2 , the coefficient of the second 
term in the expansion — that obtained from measurement 
of the kurtosis — can be obtained with the variance given 
above. Thus, in this case, estimation of e 2 via measure- 
ment of the kurtosis, provides a more sensitive probe of 
a departure from the null hypothesis e = than does es- 
timation of e from measurement of the skewness, as long 
as N < I0 7 . Note that the Cramer-Rao bound is not 
violated in this case, as measurement of e 2 , which does 
not discriminate between positive and negative values of 
e, does not provide any further information on Var(e). 
The apparent violation of the Cramer-Rao bound arises 
in this case because one of the conditions for the validity 
of the Cramer- Rao bound — that d In Pj de be non-zero at 
e = (under the null hypothesis) — is becoming invalid 
as the numerical coefficient of e in In P is made smaller. 
Had we chosen that coefficient to be zero, rather than 
1CP 2 , then the Cramer-Rao inequality would have given 
a nonsensical bound for Var(e). 



obtained from the bispectrum is statistically independent 
(for sufficiently large iV P i X ) from the estimator for / n i 2 
obtained from the trispectrum. If the variance to /„i ob- 
tained from the bispectrum is comparable to the square 
root of the variance to f n 2 obtained from the trispectrum 
[H, [|| , both will have roughly comparable sensitivities to- 
ward detection of a departure from the null hypothesis 
f n \ = 0. If there is a statistically significant detection, 
both can provide, with sufficiently good statistics, inde- 
pendent information on / n i and f n \ 2 , even if the bispec- 
trum estimator for / n i saturates the Cramer-Rao bound. 
We stop short of verifying these claims with the full like- 
lihood for the local model. However, the arguments given 
explicitly for the one-dimensional analog above also apply 
to the skewness and kurtosis in the local model, the three- 
and four-point functions at zero lag, respectively. While 
the skewness and kurtosis are not optimal estimators for 
/ni or fn\ 2 j they are statistically independent quantities 
that are derived from the bispectrum and trispectrum, 
respectively. 



A. Summary 

Suppose we have a theory that predicts new effects 
parametrized by a quantity e, with e = representing 
the null hypothesis. A general PDF for the data X given 
e (or likelihood for e for given data X) can be expanded 
in e as lnP(X|e) = lnP (X) + eg{X) + e 2 h(X) + ■■■, 
where Pq(X) is the PDF under the null hypothesis e = 
and g(X) and h(X) are functions that describe the the- 
ory. Estimation of e can be obtained through measure- 
ment of the mean value of g(X), and an independent 
estimation of e 2 can, with sufficiently good statistics, be 
obtained from measurement of the mean value of h(X). 
If ([g(X)} 2 ) > ([h(X)} 2 ), where the expectation value 
is with respect to Pq, then measurement of the mean 
value of g(X) will provide a more sensitive avenue for 
detection of a value of e that departs from the null hy- 
pothesis than measurement of the mean value of h(X). If 
([g{X)] 2 ) < ([/i(X)] 2 ), then measurement of the mean 
value of h(X) will provide a more sensitive test for detec- 
tion of a value of e that departs from the null hypothesis. 
If the two are comparable, then both tests will be compa- 
rable. In the case of a statistically-significant detection, 
there may be, given sufficient statistics, independent in- 
formation on the values of e and e 2 from measurement 
of both moments. Care must be taken in interpreting 
results of measurement of e 2 from h(X), to note that the 
distribution of the h(X) estimator for e 2 is Gaussian in 
e 2 , not e. 

B. Local-model bispectrum and trispectrum 

Similar arguments apply, mutatis mutandis, to mea- 
surement of the bispectrum and trispectrum, generaliza- 
tions of the skewness and kurtosis: the estimator for f n \ 



C. Another example 

Here we provide another example where statistically- 
independent information can be provided for estimators 
for e and e 2 , where e is a parameter that quantifies a 
departure from a null hypothesis. Suppose we want to 
test a theory in which the decay product from a polar- 
ized particle is predicted to have an angular distribution 
P(6) (x P (6) + ePi(6) + e 2 P 2 (0), where P n are Legen- 
dre polynomials, and e parametrizes the departure from 
the null hypothesis. In this case, measurement of the 
dipole, the mean value of P\(x), provides an estimator 
for e, and measurement of the quadrupole, the mean 
value of P2 (x) , provides a statistically-independent (with 
sufficiently high statistics) estimator for e 2 . Thus, mea- 
surement of both the dipole and quadrupole can be used 
to test the data, even though the Cramer-Rao inequal- 
ity tells us that Var(e) is bounded by the value obtained 
from the dipole. 



III. DEFINITIONS AND CONVENTIONS 

We have argued above that the bispectrum estimator 
f° r /ni and the trispectrum estimator for f n 2 may pro- 
vide statistically independent information. The aim of 
the rest of the paper will be to evaluate explicitly the 
correlation between the bispectrum estimator for / n i and 
the trispectrum estimator for f n 2 . We will find that it 
is nonzero, but that it becomes small in the large-^ max 
limit. 

We assume a flat sky to avoid the complications 
(e.g., spherical harmonics, Clebsch-Gordan coefficients, 
Wigner 3j and 6j symbols, etc.) associated with a spheri- 
cal sky, and we further assume the Sachs- Wolfe limit. We 
denote the fractional temperature perturbation at posi- 
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tion 9 on a flat sky by T(6), and refer to it hereafter 
simply as the temperature. 

The temperature in the local model is written, 



T(9)=t(9) + f nl [t(9)} 2 , 



(13) 



in terms of a Gaussian random field t(9). Note that our 
/ni is three times the definition, in terms of the gravita- 
tional potential, used in most of the literature. We use 
this alternative definition to simplify the equations, but 
the difference should be noted if comparing our quan- 
titative results with others. The field t(9) has a power 
spectrum Ci given by 



tr tr ) — f2<5f ,7* n C/, 



(14) 



where VL = 47r/ s k y is the survey area (in steradian), tr is 
the Fourier transform of t{9) 1 and &f + f is a Kronecker 

delta that sets l\ — —12- In the limit f n \T <C 1 (cur- 
rent constraints are f n \T < 10~ 3 ), C; is also the power 
spectrum for T{9). 

The bispectrum B(l\, I2, h) is defined by 



(15) 



The Kronecker delta insures that the bispectrum is de- 
fined only for l\ + I2 + I3 = 0; i.e., only for triangles in 
Fourier space. Statistical isotropy then dictates that the 
bispectrum depends only on the magnitudes h, Z2, I3 of 
the three sides of this Fourier triangle. The bispectrum 
for the local model is, 

B(h,l 2 ,l 3 ) = 2f vX [C ll C la + C ll C la +C la C la \. (16) 

Likewise, the trispectrum is defined by 

and for the local model, 

T{hM,hM) = fj\pl±{\h + l2\) 



+ Ptt(\h + h\) + Ptt(\h + k\) 



(18) 



where 



Pi£(\h + h 



+C l2 C h +C l2 C h ]. (19) 



Again, the trispectrum is nonvanishing only for l\ + I2 + 
^3 + h = 0, that is, only for quadrilaterals in Fourier 
space. 



IV. MINIMUM-VARIANCE 
NON-GAUSSIANITY ESTIMATORS 

We now review how to measure / n i from the bispec- 
trum and the trispectrum. To keep our arguments clear 
(and since the current goal is simply detection of a de- 
parture from non-Gaussianity, rather than precise eval- 
uation of /ni), we assume the null hypothesis f u \ = in 
the evaluation of noises and construction of estimators. 
The generalization to nonzero / n i is straightforward [l3| . 



A. The bispectrum 

From Eqs. (fT5j) and (fTo) . each triangle l\ + h + h = Q 
gives an estimator, 



(/nl 6 ) 123 



Tr Tr Tr 

tl t2 L3 



UB(hMM)IU 

with variance [using Eq. (IT4"1) ]. 2 

Q 3 Ci 1 Ci 2 Ci 3 



(20) 



\nB{hM,h)/fn\ 



(21) 



The minimum-variance estimator is constructed by 
adding all of these estimators with inverse-variance 
weighting. It is 



fn\ = ^6 E 



T r T r T r B(hM,h)/U 



and it has inverse variance, 



E 



[B(hMM)/U 



(22) 



(23) 



The sums in Eqs. (|2"21 and (|2"3")l are taken over all distinct 
triangles with h+h+h = 0. We may then take L = l 3 to 
be the shortest side of the triangle — i.e., l\, I2 > L — and 
re-write the estimator as, 



/nl 



1 2V 1 

L 



E 



l 1 +l 2 = -L,h,l 2 >L 



T r T r T z B{hM,L)/U 



(24) 



2 Here we ignore the negligible contributions from triangles and 
for the trispectrum below, quadrilaterals, where two sides have 
the same length. We do, however, include these configurations in 
the numerical analysis described in Appendix [E] and verify that 
this assumption is warranted. 
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and the inverse- variance as 

-2 _ I _L v 

L 1^+12 = - L A x , 1 2 >L 



(J,, 



[B(h,l 2 ,L)/f nl ] 



(25) 

The factor of 1/2 is included to account for double count- 
ing of identical triangles, those with Z x O ^2- 



_Z. Approximation to the Bispectrum Estimator 

Now consider the variance cr^ with which f n \ can be 
measured from the bispectrum. Take Ci = A /I 2 for the 
power spectrum, where A ~ 6 x 10~ 10 is the power- 
spectrum normalization. The bispectrum in Eq. (|16|) is 
maximized for squeezed triangles, those with L <C ii,?2) 
and thus with li ~ l 2 - In this limit, the bispectrum 
can be approximated B{1\, h,L) ~ 4A 2 f n \L~ 2 l^ 2 . Then, 
from Eq. (f25j) the inverse variance (and thus the signal- 
to-noise) is dominated by squeezed triangles, and it is fur- 
thermore dominated by those triangles with the modes 
L of the smallest magnitudes L. 




FIG. 2: Three triangles that all share a shortest side L. 

More precisely, let us evaluate the contribution (c^~ 2 )^ 
to the inverse variance obtained from all triangles that 
share the same shortest side L, as shown in Fig. [5J Since 
this contribution is dominated by modes with li ~ Z 2 , 
the inverse- variance from these triangles is, 



1 L 2 ^{AC L C h ) 2 



2fl A 



&A 1 



n 



8A 

nL 2 



E 1 



2A 



where we have used Ylf = ^ / d 2 l/(2n) 2 in the last line. 



The full estimator then sums over all L as in Eq. (|24l) 
The full inverse- variance is then 



= E 

L 

An 



I In 



(27) 



in agreement with Ref. |T4| . 

To summarize: (1) the signal-to-noise is greatly dom- 
inated by triangles with one side much shorter than the 
other two. (2) The signal-to-noise is dominated primarily 
by those with the smallest short side. (3) The contribu- 
tion to the full signal-to-noise is equal per logarithmic 
interval of L, the magnitude of the smallest mode in the 
triangle. (4) Even if there is a huge number of triangles 
that enter the estimator, the error in the estimator is 
still dominated by the cosmic variance associated with 
the values of Tg for the L modes of the smallest L. 

Since the variance is dominated by squeezed triangles, 
we can approximate the estimator, Eq. (|24[) . as 



where 



An 



T 



(28) 



(29) 



B. The trispectrum 

Now consider the trispectrum. Each distinct quadri- 
lateral l\ + 1-2 + I3 + I4 = gives an estimator for the 
trispectrum with some variance. Adding the individ- 
ual estimators with inverse-variance weighting gives the 
minimum- variance estimator, 3 

T t T r T T T r J{hMMM)/fJ 



n z Ci 1 Ci 2 Ci 3 Ci l 



and the inverse variance, 



0", 



E 



[r(h 



n 2 Ci 1 Gi2Ci 3 Ci i 



(30) 



(31) 



The sums here are over all distinct quadrilateral l\ + 1% + 
I3 + h = 0, and we again neglect quadrilaterals where 
two or more sides are the same. 



s Strictly speaking, one must subtract the connected part of the 
trispectrum. We omit this term to keep our expression compact, 
but it is included in the analytic and numerical calculations of 
the variances and covariances discussed below. 
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Each quadrilateral will have a smallest diagonal, which 
we call L. The quadrilateral is then described by two 
triangles that each share their smallest side L; the two 
sides of the first triangle will be l\ and I2 and the two 
sides of the second triangle will be ^3 and I4. We can 
then re- write the sums in Eqs. ([50)1 and (|3f I) as 



L l 1 +l 2 =L l 3 +l 4 = -L 



(32) 



The sum here is only over combinations of {Zi, I2, 13, li} 
where the lengths of the two other diagonals, |Zi + U\ — 

\h + h\ and \h + U\ = \h + h\, are both > L, so that L 
is the shortest diagonal [cf. Eq. ([15])] . 




FIG. 3: An example of an elongated quadrilateral with a 
shortest diagonal L. Note that it is equivalent to two elon- 
gated triangles that share the same shortest side L. 

Let's now consider the local-model trispectrum given 
in Eqs. ([18]) and (19]). Tne tnree terms in Eq. (fT8]) sum 
over the three diagonals of the quadrilateral. Eq. (fT9]) 
then shows that each of these terms is the product of 
the power spectrum Cl evaluated for the diagonal (e.g., 
L = h + h = —h — h) times a sum of products of 
power spectra evaluated for each of the quadrilateral 
sides. The quadrilateral is thus maximized for highly 
elongated quadrilaterals, those with Zj 3> L, with one 
short diagonal, as shown in Fig. [3] The trispectrum for 
these elongated quadrilaterals may be approximated as 
T(Fi , F 2 , F 3 , h) ~ Wfni 2 C L C h C l3 . 

Now consider the contribution (o~ t 2 )^ to the inverse 
variance from all quadrilaterals that share the same 
shortest diagonal L. Using Eq. ([3"T]) and approximating 
the trispectrum by the squeezed limit, this is 



K" 2 ) £ -JEE 




(i6c L c h c h y 



^ 1a' 



(33) 



The factor 1/8 in the first line accounts for the !i « (2 
and I3 -f-> U symmetries and the symmetry under inter- 
change of the (Zi, I2) and (Z3 , ^4) triangles. Again, the full 
variance is obtained by summing over L modes. Thus, 

-2 ^ /sky A 2 



Note that we obtain the Z~* x scaling of the variance noted 
in Ref. Recall that of is a variance to / n i (rather 
than /„i). Thus, the ratio of the smallest f n \ detectable 
via the trispectrum to the smallest detectable via the bis- 



pectrum is ^/o- t /a 2 — 1-7 /sky [ L min ln(L mm /L min )] J 
For reasonable values of L m i n and L maxi the smallest / n i 
detectable with the bispectrum is smaller, by a factor of 
order a few, than that detectable with the trispectrum 



a/2 




We can now derive an approximation for (/ n i 2 )* noting 
that the variance, and thus the signal-to-noise, is domi- 
nated by equilateral triangles. From Eq. (]30[) . and using 
the squeezed limit for the trispectrum, we find, 



(35) 



where is the quantity given in Eq. (|29[) . Comparing 
with the estimator, Eq. (|28|) . we see that this estimator is 
constructed from precisely the same sums of triangles as 
the bispectrum estimator. Strictly speaking, the bispec- 
trum estimator for / n i involves a sum over a huge num- 
ber of triangles; the number of such triangles scales as 
ATp ix /6 with the number of pixels in the map. Likewise, 
the trispectrum estimator for / n i 2 involves a sum over all 
quadrilaterals, and the number of these scales as iVp ix /24. 
Thus, one naively expects the correlation between the es- 
timators to be extremely weak, given the huge number 
of bispectrum and trispectrum configurations. Eqs. (]28j) 
and ([35]) show, however, that the quadrilateral configu- 



I 4 

in ax* 



(34) 



rations that dominate the trispectrum estimator for f n \ 2 
are very closely related to the triangle configurations that 
dominate the bispectrum estimator for / n i. 



V. CORRELATION BETWEEN BISPECTRUM 
AND TRISPECTRUM ESTIMATORS FOR / n i 

Since the bispectrum and trispectrum estimators for 
/„i are both constructed from the same CMB map, it 
is expected that there should be some correlation be- 
tween the two estimators. Eqs. (f2"5j) and (1331) help clarify 
the nature of the correlation. Clearly, if we use for the 
bispectrum estimator only triangles that share a single 
shortest side L and for the trispectrum estimator only 
quadrilaterals with the same L as the shortest diagonal, 
then the two estimators provide the same quantity, mod- 
ulo the difference between the magnitude |Tg| 2 (from the 
bispectrum estimator) and its expectation value A/L 2 
(from the trispectrum estimator). 

However, we have not only triangles/quadrilaterals 
from a single L shortest side/diagonal, but those con- 
structed from many L's. The correlation between the 
bispectrum and trispectrum estimators should thus de- 
crease as the number of L modes increases in the same 
way that the means (x) and (a; 2 ) measured with a large 
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number N of data points Xi will become uncorrelated as 
N becomes large. 

Of course since f n \ b is linear in Tg, the covariance be- 
tween / n i & and (/ni 2 )' will be zero. However, the correla- 
tion between (/ n i b ) 2 and (/ n i 2 )' will be nonzero. We thus 
now estimate the magnitude of the correlation coefficient, 
which we define as 



and thus that 



a (/ nl b ) 2 a ({fjy 



A (/ni') 2 



1/2 



1/2 : 



(36) 



where A(Q) = Q — (Q). To simplify the equations, we 
can drop the prefactors in Eqs. (f!28")) and (|3"5|) and deal 
with quantities, 



L L 

The desired correlation coefficient is then 
(A(F 2 )AG) 



[A(F 2 r) 1/2 ((AG) 2 ) 



1/2 



(37) 



(38) 



We begin by noting that X z is a random variable with 
zero mean. In the large-Z max limit, it will be well approx- 
imated by a Gaussian random variable, in which case 

Some other useful relations include, 



<n - E {TtT z x z x l2 ) = nJ2* (xi 



(39) 
(40) 



(G 2 ) = E 7W2 



L\ ,L2 



T 2 T< 
lj \ L "X 



x% x% 



Li 



(^E^l^ 

(41) 



.4 



<^> - EEE^(^^)K^ 2 ^- 



h\ L>2 L3 



A A 



Li,L 2 



= ^E zfZf( x i x i) = ^ G2 )- (42) 



Also, since F is a sum over (approximately) Gaussian 
random variables, it is also well approximated by a Gaus- 
sian random variable, and so (J 7,4 ) ~ 3 (F 2 ) . 
From these relations, it follows that 

(A(F 2 )AG) = (F 2 G) - (F 2 ) (G) = Q \(G 2 ) - (G) 2 ] , 

(43) 



^((AG) 2 ) 
V2(F 2 ) 



1/2 



1/2 



min In (-^max/ -^min) 



(44) 



Thus, if L max is small, then the correlation will be 
large. However, the correlation coefficient decreases as 
[ln(L max )] _1 , and it will become negligible in the limit 
that Lmax is large. 

Strictly speaking, the X z are not entirely statistically 
independent, as we have assumed here, as many are con- 
structed from the same measurements. They are also 
not perfectly Gaussian, as we have assumed. However, 
as we discuss in Appendix [B] we have checked with a full 
numerical calculation of the correlation coefficient that 
the basic conclusions — and particularly the scaling of the 
correlation coefficient r with L max — are sound. 



VI. CONCLUSIONS 

A large body of recent work has focused on tests of the 
local model for non-Gaussianity that can be performed 
with measurement of the CMB trispectrum and bispec- 
trum. Here we have clarified how the bispectrum and 
trispectrum may provide statistically independent infor- 
mation on the local-model non-Gaussianity parameter 
/ n i , even if the bispectrum estimator for f n \ saturates the 
Cramer-Rao bound. The basic point is that the Cramer- 
Rao inequality puts a lower limit to the variance with 
which a given parameter can be measured. If the likeli- 
hood function is precisely Gaussian, then the likelihood 
is described entirely by the variance. However, if the like- 
lihood function is not precisely Gaussian, then there is 
more information in the likelihood beyond the variance 
(see, e.g., Section VI in Ref. [15j). In the current prob- 
lem, this is manifest in that a statistically-independent 
measurement of f n 2 can be obtained from the trispec- 
trum without contributing to the variance of f^i- 

We then built on an observation of Ref. [I3[ to illus- 
trate the nature of the correlation between the bispec- 
trum estimator for f n \ and the trispectrum estimator of 
/ni 2 . This analysis demonstrates that the two estima- 
tors do indeed become statistically independent in the 
large-Z max limit. 

Throughout we have made the null hypothesis f n \ = 
to estimate the variances with which f n \ can be mea- 
sured from the bispectrum and with which f n 2 can be 
measured from the trispectrum. This is suitable if one 
is simply searching the data for departures from the null 
hypothesis. However, as emphasized by Ref. [l3[, the 
minimum-variance estimators constructed under the null 
hypothesis are no longer optimal if there is a strong sig- 
nal. If so, then forecasts of signal-to-noise made with 
the null hypothesis are no longer valid in the limit of 
large signal-to-noise, and this calls into question claims 
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Q that the trispectrum will provide a better probe of 
the local model in the large-S/N limit. In this limit, a 
new bispectrum estimator can be constructed to saturate 
the Cramer- Rao bound [HI], and an analogous optimal 
trispectrum estimator can in principle be found. Still, 
the observation that the bispectrum and trispectrum es- 
timators in the local model are constructed from the same 
sums of triangles suggests that the precisions with which 
/ n i can be measured, in the high-S/N limit, from the bis- 
pectrum and trispectrum will be roughly comparable. 

Although we assumed the null hypothesis to argue that 
the bispectrum and trispectrum estimators for f n \ are in- 
dependent, the same arguments should also apply in the 
high-S/N limit. For example, if the bispectrum estima- 
tor finds /ni to be different from zero, with best-fit value 
/ n i, then the likelihood can be re-parametrized in terms 
of a quantity e = f u \ — f u \ that quantifies the departure 
from the new null hypothesis /„i = f n \. Measurement 
of e with the trispectrum can then be used to provide a 
statistically independent consistency check of the model. 
Or, in simpler terms, the skewness and kurtosis are still 
two statistically independent quantities that can be ob- 
tained from a measured distribution, even if the skewness 
(or kurtosis) of that distribution is nonzero. 

Throughout, we have made approximations and sim- 
plifications to make the basic conceptual points clear, 
and we have restricted our attention simply to the lo- 
cal model, which we have here defined to be $ = <\> + 
/ni(</> 2 — (4> 2 ))- However, inflationary models predict a 
wider range of trispectra [16j . Likewise, analysis of real 
data will introduce a number of ingredients that we have 
excised from our simplified analysis. Still, we hope that 
the points we have made here may assist in the inter- 
pretation and understanding of experimental results and 
perhaps elucidate statistical tests of other, more general, 
non-Gaussian models. 
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Appendix A: The continuum-discretuum connection 

In this paper we have chosen to work with discrete 
Fourier transforms where the calculations of variances 
and covariances are more straightforward. Here we show 
how to derive the expressions for power spectra, bispec- 
tra, and trispectra for this discrete formalism to the con- 
tinuum analysis discussed in most of the theoretical lit- 
erature. 



Following Ref. |l3|, we note that 

n 



T T = / d^e- li - u T(6) 



pix 



J2e- ll8 T(0), (AI) 



where f2 = 47r/ s k y is the area of sky (in steradians) sur- 
veyed, from which we infer the correspondence 

(N pix /n) fd 2 9. Likewise, 



T(9) = 



d 2 l 

W) 



2 e I - 



-Xy^T^, (A2) 



from which we infer the correspondence <^> 
Q, J d 2 l/(2ir) 2 . The Dirac delta function is then written 
in the discrete formalism as a Kronecker delta as follows: 



(2ir) 2 5(t-r') = / d 2 9e lff < r -^ 



iv pix s 

(A3) 

The definitions in Section HTT1 of the power spectrum, bis- 
pectrum, and trispectrum follow from this relation. 

One advantage of this formulation is that equations 
can be checked for consistency using dimensional analy- 
sis. Recalling that 9 has units [9 2 ] =sterad and that tem- 
perature has units [T(6>)]=K, it follows, for example, that 
[Tjj =K-sterad, [Q] =K 2 -sterad, [/ nl ] =K~ 1 , [B] =K 3 - 
sterad 2 , and [7~] =K 4 -sterad 3 . As another check, the 
variance and covariances should have an appropriate scal- 
ing with / s ky if factors of Q are carried properly through 
the calculation. 



Appendix B: Full correlation between trispectrum 
and bispectrum estimators 

As discussed in the text, the minimum- variance bis- 
pectrum and trispectrum estimators for / n i are given by 



/nl' 



4 



E 



3in 2 C h C h Ci : 



- T rTfT r3 ,(Bi) 



(/nl ) 



2 \t _ Ji. 



E 



x Tf Tf Tf Tf . 

tl '2 *3 *4 



7"(/i, hi h-> h) 
41WC h C l2 C h C h 



(B2) 



i h t are the variances of the bispectrum and 
trispectrum estimator. Here we sum over all triangles 
and quadrilaterals (not just those with no equal sides), 
and the factors of 3! and 4! take into account double 
counting of degenerate terms in the sum and permuta- 
tion factors for triangles and quadrilaterals with equal 
sides. In Sec. [V] we used the squeezed-limit approxima- 
tion to estimate the correlation coefficient between (f n \ ) 2 

and (/ni 2 )'. In this Appendix we derive the full expres- 
sion for this correlation coefficient and verify that the 
approximations made in Sec. |V] are valid. 
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The covariance will consist of a weighted sum of the 10- 
point function. However, because of the fact that no two 
indices in the trispectrum or each bispectrum estimator 
can add to zero we know that two of the bispectrum 
indices must combine. The rest of the covariance will 
then be diagonal leading to 



where 



(/nl b ) 2 (/„l 2 )* 

E 

h+h = -L,l 3 +l 4 =L 



hl 2 l 3 — 



B(L, l\,lz)B{L, I3, h)T(h,hi h, h)/ fni 



n 2 Cl Cji Ci 2 Ci 3 Ci t 



(B3) 



Finally, we need to compute the variance of (/ n i b ) 2 - To 
do this we must compute the 12-point function 



B(li,fa, fa)B(li,ti,t2)B(l2,ti,ta)B(l3,t2,t3) 

X +T2+I3 ,0^-fi +ti +t 2 -?i+? 3 ,0^r 3 +f 2 +t 3 ,0 

+ ^B{h,l 2 M)B{hhti)B{t 2 , t 3 ,l 3 )B(t 2 , hM) 

(B6) 



T r T r T r \T T T r T r \T f T r T r \T? T r T r 



(B4) 



where all temperatures within each group of three sepa- 
rated by a '|' have zero covariance. The variance takes 
the form 



6\2 



(/nl) 



= ^ 

{It} 



(B5) 



t\t 2 t 3 



^l 2 Ci x Ci 2 Gi 3 Ct^Ct 2 Ct 3 ' 



Numerically evaluating the sum in Eq. (IB6|) shows that 
for Z max > 100 the second (non-Gaussian) term con- 
tributes less than 1% to the variance of (/ni b ) 2 - We have 
moreover numerically evaluated the exact expression for 
the correlation coefficient and verified that, as our es- 
timates indicate, the correlation is of order < 10% for 



2 and Z max > 100. 
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